Comb to Pipeline: Fast Software Encryption Revisited
نویسندگان
چکیده
AES-NI, or Advanced Encryption Standard New Instructions, is an extension of the x86 architecture proposed by Intel in 2008. With a pipelined implementation utilizing AES-NI, parallelizable modes such as AES-CTR become extremely efficient. However, out of the four non-trivial NIST-recommended encryption modes, three are inherently sequential: CBC, CFB, and OFB. This inhibits the advantage of using AES-NI significantly. Similar observations apply to CMAC, CCM and a great deal of other modes. We address this issue by proposing the comb scheduler – a fast scheduling algorithm based on an efficient look-ahead strategy, featuring a low overhead – with which sequential modes profit from the AES-NI pipeline in real-world settings by filling it with multiple, independent messages. We apply the comb scheduler to implementations on Haswell, Intel’s latest microarchitecture, for a wide range of modes. We observe a drastic speed-up of factor 5 for NIST’s CBC, CFB, OFB and CMAC performing around 0.88 cpb. Surprisingly, contrary to the entire body of previous performance analysis, the throughput of the authenticated encryption (AE) mode CCM gets very close to that of GCM and OCB3, with about 1.64 cpb (vs. 1.63 cpb and 1.51 cpb, resp.), despite Haswell’s heavily improved binary field multiplication. This suggests CCM as an AE mode of choice as it is NIST-recommended, does not have any weak-key issues like GCM, and is royalty-free as opposed to OCB3. Among the CAESAR contestants, the comb scheduler significantly speeds up CLOC/SILC, JAMBU, and POET, with the mostly sequential nonce-misuse resistant design of POET, performing at 2.14 cpb, becoming faster than the wellparallelizable COPA. Finally, this paper provides the first optimized AES-NI implementations for the novel AE modes OTR, CLOC/SILC, COBRA, POET, McOE-G,
منابع مشابه
How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4
This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/bl...
متن کاملHardware Implementation of AES Encryption and Decryption System Based on FPGA
AES algorithm has played an important role in information security field for a long time since Rijndael algorithm was announced as advanced encryption standard. Hardware implementation based on FPGA of AES algorithm has the advantages of fast, flexible, short development cycle, etc. Hardware implementation based on FPGA of AES encryption and decryption system was studied in detail in this paper...
متن کاملHigh-speed architectures for binary-tree based stream ciphers: Leviathan case study
Abstract Real-time applications such as streaming media and voice require encryption algorithms that do not propagate errors and support fast encryption on small blocks. Since IP packets are delivered out-of-order in routed networks it is difficult to synchronize the source and the destination, therefore requiring encryption algorithms to support out-of-order generation of key stream. In this p...
متن کامل